Refactor graph traversal in GlobalEffects#8593
Conversation
113c475 to
a0ec0cf
Compare
a0ec0cf to
e7e6e33
Compare
|
Interesting idea about using the SCCs to further optimize the propagation. We already have a SCC utility that would hopefully make this fairly easy to experiment with. And if we need to detect cycles to add an extra trapping effect, then we could just look for self-edges and nontrivial SCCs. But let's not do that here! |
tlively
left a comment
There was a problem hiding this comment.
Nice! Glad to see that this approach speeds things up.
| std::map<Function*, FuncInfo>& funcInfos) { | ||
|
|
||
| std::unordered_set<std::pair<Name, Name>> processed; | ||
| std::deque<std::pair<Name, Name>> work; |
There was a problem hiding this comment.
Did you compare using a deque with using a UniqueDeferredQueue or a stack vector?
In DAE2 I found that using a stack vector was significantly faster than using UniqueDeferredQueue, but I don't think I tried a deque.
| if (!calleeEffects) { | ||
| callerEffects.reset(); | ||
| return; |
There was a problem hiding this comment.
Why is this correct? The caller might have effects from its own body or from other callees.
There was a problem hiding this comment.
nullopt here actually means "don't know" or "all effects" rather than "no effects". If we don't know the effects of the callee then we also don't know the effects of the caller.
I agree that this is misleading though. I think I can introduce a static method in EffectAnalyzer that gives back an EffectAnalyzer that contains all effects, that way we don't need an optional at all here and the logic stays more uniform, we just always merge unconditionally. Does that sound good?
There was a problem hiding this comment.
Yes, that sounds much simpler 👍
| if (callee == caller) { | ||
| auto& callerEffects = funcInfos.at(module.getFunction(caller)).effects; | ||
| if (callerEffects) { | ||
| callerEffects->trap = true; |
There was a problem hiding this comment.
IIUC, this does not handle mutual recursion the same way the previous implementation did. The previous implementation could find arbitrary recursion cycles by just looking at self edges in the transitive closure of the reverse call graph. But we don't have that transitive closure of the graph anymore, so looking for self edges is no longer sufficient to find nontrivial recursion cycles.
(So actually we may want to use the SCC utility to find cycles here after all 🫣)
There was a problem hiding this comment.
The logic here is basically the same as the way we computed the transitive closure before, the difference is that we just don't materialize the graph of the transitive closure. I think the case you're saying is tested here (or let me know if you have a different case in mind).
In this case we have 1 -> 2 and 2 -> 1 in the work list. Then we pop off 1 -> 2 and push 1 -> 1 which ends up hitting the cycle detection.
There was a problem hiding this comment.
Oh I see... but isn't this inefficient? If we have call graph A -> B -> C, after we merge C's effects into B and then merge B's effects into A, we know that merging C's effects into A would not change anything.
But landing this and then iterating with smaller changes sounds fine, especially since this already improves on the status quo.
| } | ||
|
|
||
| for (const Name& callerCaller : callerCallers->second) { | ||
| if (processed.contains({callee, callerCaller})) { |
There was a problem hiding this comment.
We can use the bool returned from emplace to avoid this separate call to contains.
| // Even if nothing changed, we still need to keep traversing the callers | ||
| // to look for a potential cycle which adds a trap affect on the above | ||
| // lines. |
There was a problem hiding this comment.
I think it would also be possible to look at whether anything was changed by either the cycle-handling code above or the effect merge here. If not, then we don't need to propagate further. (This will be even simpler if we find cycles and apply the trap effect as a separate step before computing this fixed point.)
Refactor GlobalEffects to not compute the transitive call graph explicitly but instead aggregate effects as we go. This improves the runtime of the pass by 4.3% on calcworker (1.21792 s -> 1.16586 s averaged over 20 compilations). It also helps prepare the code for future changes to support effects for indirect calls.
Another potential future improvement here is to use SCC, which would let us stop processing children early in cases where there are no effects to update. Currently we can't do this because we add trap effects to potentially-recursive call loops, so even if no effects were updated, we need to keep going to find potential cycles.